Skip to content

Conversation

@vchuravy
Copy link
Member

No description provided.

@vchuravy vchuravy requested a review from christiangnrd June 10, 2025 08:42
@vchuravy
Copy link
Member Author

@maleadt Currently we only run the KA tests for CUDA, but should we also run some GPUArrays test to ensure that KA doesn't break things?

@github-actions
Copy link
Contributor

github-actions bot commented Jun 10, 2025

Benchmark Results

main 4c2fc9f... main / 4c2fc9f...
saxpy/default/Float32/1024 0.0435 ± 0.029 ms 0.0477 ± 0.03 ms 0.912 ± 0.82
saxpy/default/Float32/1048576 0.457 ± 0.021 ms 0.445 ± 0.023 ms 1.03 ± 0.072
saxpy/default/Float32/16384 0.057 ± 0.026 ms 0.056 ± 0.026 ms 1.02 ± 0.67
saxpy/default/Float32/2048 0.0455 ± 0.026 ms 0.045 ± 0.026 ms 1.01 ± 0.83
saxpy/default/Float32/256 0.0434 ± 0.03 ms 0.0572 ± 0.029 ms 0.759 ± 0.64
saxpy/default/Float32/262144 0.157 ± 0.025 ms 0.155 ± 0.026 ms 1.01 ± 0.23
saxpy/default/Float32/32768 0.0614 ± 0.026 ms 0.0599 ± 0.026 ms 1.03 ± 0.62
saxpy/default/Float32/4096 0.0485 ± 0.023 ms 0.0476 ± 0.023 ms 1.02 ± 0.7
saxpy/default/Float32/512 0.0448 ± 0.029 ms 0.0593 ± 0.029 ms 0.755 ± 0.62
saxpy/default/Float32/64 0.0439 ± 0.029 ms 0.0447 ± 0.029 ms 0.981 ± 0.9
saxpy/default/Float32/65536 0.0761 ± 0.026 ms 0.074 ± 0.026 ms 1.03 ± 0.5
saxpy/default/Float64/1024 0.0441 ± 0.029 ms 0.0465 ± 0.029 ms 0.947 ± 0.86
saxpy/default/Float64/1048576 0.56 ± 0.078 ms 0.511 ± 0.061 ms 1.1 ± 0.2
saxpy/default/Float64/16384 0.0619 ± 0.026 ms 0.0626 ± 0.026 ms 0.99 ± 0.59
saxpy/default/Float64/2048 0.045 ± 0.024 ms 0.0445 ± 0.025 ms 1.01 ± 0.8
saxpy/default/Float64/256 0.0468 ± 0.029 ms 0.0595 ± 0.029 ms 0.786 ± 0.62
saxpy/default/Float64/262144 0.171 ± 0.026 ms 0.169 ± 0.027 ms 1.01 ± 0.22
saxpy/default/Float64/32768 0.0687 ± 0.025 ms 0.0666 ± 0.025 ms 1.03 ± 0.55
saxpy/default/Float64/4096 0.0488 ± 0.024 ms 0.0476 ± 0.023 ms 1.03 ± 0.71
saxpy/default/Float64/512 0.0437 ± 0.029 ms 0.052 ± 0.029 ms 0.841 ± 0.73
saxpy/default/Float64/64 0.0417 ± 0.03 ms 0.0451 ± 0.029 ms 0.925 ± 0.89
saxpy/default/Float64/65536 0.0873 ± 0.026 ms 0.0853 ± 0.026 ms 1.02 ± 0.43
saxpy/static workgroup=(1024,)/Float32/1024 0.043 ± 0.029 ms 0.0487 ± 0.029 ms 0.884 ± 0.79
saxpy/static workgroup=(1024,)/Float32/1048576 0.454 ± 0.021 ms 0.45 ± 0.023 ms 1.01 ± 0.071
saxpy/static workgroup=(1024,)/Float32/16384 0.0534 ± 0.026 ms 0.0527 ± 0.025 ms 1.01 ± 0.69
saxpy/static workgroup=(1024,)/Float32/2048 0.0439 ± 0.026 ms 0.043 ± 0.026 ms 1.02 ± 0.87
saxpy/static workgroup=(1024,)/Float32/256 0.0447 ± 0.027 ms 0.0572 ± 0.027 ms 0.781 ± 0.6
saxpy/static workgroup=(1024,)/Float32/262144 0.157 ± 0.027 ms 0.153 ± 0.026 ms 1.02 ± 0.25
saxpy/static workgroup=(1024,)/Float32/32768 0.06 ± 0.026 ms 0.0572 ± 0.026 ms 1.05 ± 0.66
saxpy/static workgroup=(1024,)/Float32/4096 0.0469 ± 0.025 ms 0.046 ± 0.025 ms 1.02 ± 0.78
saxpy/static workgroup=(1024,)/Float32/512 0.0442 ± 0.028 ms 0.0582 ± 0.028 ms 0.758 ± 0.6
saxpy/static workgroup=(1024,)/Float32/64 0.0493 ± 0.026 ms 0.0536 ± 0.026 ms 0.919 ± 0.66
saxpy/static workgroup=(1024,)/Float32/65536 0.0739 ± 0.026 ms 0.0704 ± 0.025 ms 1.05 ± 0.53
saxpy/static workgroup=(1024,)/Float64/1024 0.0428 ± 0.028 ms 0.0454 ± 0.029 ms 0.943 ± 0.87
saxpy/static workgroup=(1024,)/Float64/1048576 0.57 ± 0.047 ms 0.523 ± 0.07 ms 1.09 ± 0.17
saxpy/static workgroup=(1024,)/Float64/16384 0.0544 ± 0.025 ms 0.0545 ± 0.026 ms 0.999 ± 0.66
saxpy/static workgroup=(1024,)/Float64/2048 0.044 ± 0.025 ms 0.0437 ± 0.026 ms 1.01 ± 0.84
saxpy/static workgroup=(1024,)/Float64/256 0.0432 ± 0.027 ms 0.0578 ± 0.027 ms 0.747 ± 0.58
saxpy/static workgroup=(1024,)/Float64/262144 0.165 ± 0.03 ms 0.166 ± 0.028 ms 0.993 ± 0.25
saxpy/static workgroup=(1024,)/Float64/32768 0.0629 ± 0.026 ms 0.0622 ± 0.025 ms 1.01 ± 0.59
saxpy/static workgroup=(1024,)/Float64/4096 0.0473 ± 0.025 ms 0.0461 ± 0.025 ms 1.03 ± 0.77
saxpy/static workgroup=(1024,)/Float64/512 0.0426 ± 0.028 ms 0.0574 ± 0.028 ms 0.741 ± 0.61
saxpy/static workgroup=(1024,)/Float64/64 0.0462 ± 0.028 ms 0.0602 ± 0.028 ms 0.767 ± 0.58
saxpy/static workgroup=(1024,)/Float64/65536 0.0858 ± 0.027 ms 0.0842 ± 0.027 ms 1.02 ± 0.46
time_to_load 1.34 ± 0.013 s 1.34 ± 0.01 s 1 ± 0.012

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy mentioned this pull request Jun 10, 2025
7 tasks
@maleadt
Copy link
Member

maleadt commented Jun 10, 2025

should we also run some GPUArrays test to ensure that KA doesn't break things?

That would probably be smart, yeah. Maybe once the dust settles and JLArrays can use the new CPU back-end?

@vchuravy vchuravy merged commit bc646a4 into main Jun 10, 2025
20 of 27 checks passed
@vchuravy vchuravy deleted the vc/downstream_ci2 branch June 10, 2025 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants